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ESTIMATING MARGINAL SURVIVAL FUNCTION BY 
ADJUSTING FOR DEPENDENT CENSORING USING MANY 

COVARIATES 1 

By Donglin Zeng 

University of North Carolina at Chapel Hill 

One goal in survival analysis of right-censored data is to estimate 
the marginal survival function in the presence of dependent censoring. 
When many auxiliary covariates are sufficient to explain the depen- 
dent censoring, estimation based on either a semiparametric model 
or a nonparametric model of the conditional survival function can 
be problematic due to the high dimensionality of the auxiliary infor- 
mation. In this paper, we use two working models to condense these 
high-dimensional covariates in dimension reduction; then an estimate 
of the marginal survival function can be derived nonparametrically 
in a low-dimensional space. We show that such an estimator has the 
following double robust property: when either working model is cor- 
rect, the estimator is consistent and asymptotically Gaussian; when 
both working models are correct, the asymptotic variance attains the 
efficiency bound. 

1. Introduction. Right-censored data with dependent censoring are com- 
mon in many epidemiological studies. Such data consist of n i.i.d. copies of 
the observation (Y = T A C,R = I(T < C),L), where T is the failure time 
of interest, C is the right censoring time, and L includes the covariate in- 
formation. Usually, the covariates L contain not only subject demographic 
information and disease history, but also much other auxiliary information 
which researchers are not primarily interested in but which is informative 
in predicting subjects' failure time or explaining why subjects drop out, or 
both. For example, in a typical medical study, L may contain the patient's 
willingness to participate in the study, the patient's accessibility to hospi- 
tals, the social support from the patient's family members, or the patient's 
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genetic information, and so on. When much auxiliary information has been 
collected, in practice, it is safe to assume that L is sufficient to explain the 
dependence between T and C. Equivalently, T and C are independent when 
conditional on L. 

The purpose of this article is to estimate the marginal survival function 
of T using right-censored data. A standard estimate is the Kaplan-Meier 
estimate. However, it is well known that, when T and C are dependent, this 
estimator is inconsistent. Another intuitive approach to estimate the sur- 
vival function of T is to estimate the conditional distribution of T given L 
using a semiparametric model [e.g., the Cox proportional hazard model; see 
Cox (1972)], the proportional odds model [Bennett (1983), etc.], or via non- 
parametric estimation approaches such as using a local likelihood function 
[Tibshirani and Hastie (1987)]. Then the estimate of the marginal survival 
function of T is simply the empirical average of the conditional distribution 
of T given L over all the observed covariates. However, the above approaches 
can be problematic when the auxiliary covariates, L, consist of many vari- 
ables. This is because when L has at least three dimensions, nonparametric 
estimation of the distribution of T given L is infeasible in a moderate-sized 
sample due to the curse of dimensionality; and in any semiparametric model, 
the parametric function of L in the model of T given L is likely to be mis- 
specified. Consequently, these intuitive approaches bias the estimation of the 
survival function of T. 

To reduce the limitation in the above intuitive approaches, in this article 
we propose two working models for both the lifetime T and the censoring 
time C given all the covariates L. Then two-dimensional condensed informa- 
tion of L is extracted from the working models and used as the new covariates 
in place of L. The estimator of the survival function is obtained by maxi- 
mizing a pseudo-likelihood function nonparametrically in the space with the 
reduced dimension. It is shown that if either working model is correct, the 
estimator of the marginal survival function is consistent and asymptotically 
Gaussian; if both working models are correct, the asymptotic variance of 
the estimator attains the generalized Cramer-Rao bound of the full model 
space [cf. Bickel, Klaassen, Ritov and Wellner (1993)]. The first property is 
named "double robustness" by Robins, Rotnitzky and van der Laan (2000), 
since the estimator remains consistent if one working model is misspecified 
but the other one is correct. 

The method of using the condensed information of the high-dimensional 
covariates in the estimation dates back to the propensity score approach by 
Rubin (1976) in a simple regression, where the propensity score was defined 
as the predicted missing probability given all the covariates. Little (1986) 
further combined the propensity score and the mean score, the latter of 
which was defined as the predicted mean response given all the covariates, 
to estimate the population mean in a survey study. Such methods have been 
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recently developed and generalized to study dependent censoring in semi- 
parametric regression and survival analysis by Robins and others [Rotnitzky 
and Robins (1995), Robins, Rotnitzky and van der Laan (2000) and Scharf- 
stein and Robins (2002)]. Although all the above mentioned approaches in- 
cluding ours pursue the summary information of the covariates, sometimes 
referred to as the propensity score or risk score, using the working mod- 
els for T and C given L, the estimation approach we take is much different 
from theirs. Robins, Rotnitzky and van der Laan's approach is to begin with 
an inverse-weighted estimating equation, where only the complete observa- 
tions are used in the estimating equation and each complete observation is 
weighted with the inverse of the probability of not being censored; a final 
estimating equation for estimating the marginal survival distribution is to 
subtract from the inverse-weighted estimating equation the projection on 
the score tangent space. However, the method we propose in this article is a 
purely likelihood-based approach: we first obtain the condensed covariates 
by optimizing the pseudolikelihood functions based on the working models; 
we then optimize another pseudolikelihood function to derive the estimate 
of T's survival function. Therefore, the likelihood-based approach we take 
only involves simple optimization steps and the estimate turns out to have a 
simple expression; by contrast, the approach in Robins, Rotnitzky and van 
der Laan (2000) requires a practical user to have knowledge of the projection 
on the score space. 

This article is organized as follows: In Section 2 we give the details of 
estimating the marginal survival function; the asymptotic properties of our 
estimator are then given in Section 3, where we also provide an algorithm 
to estimate the asymptotic variance; the numerical results from a simula- 
tion study are given in Section 4; finally, the article concludes with some 
discussion. Most of the proofs in this article are deferred to the Appendix. 

2. Estimation. Under the assumption that T and C are independent 
given L, the observed likelihood function for n observations can be written 

as 

n 

nNL(^|ii)^e-^I^WI^)/ lqL (y i |L J ) 1 "^e-^l^ y ^^/ L (L i )], 
i=i 

where h^i(-\L) and /ic|_l(-|L) are the hazard rate functions for T and C 
given L, respectively; H T \ L (-\L) and H C \ L (-\L) are their respective cumu- 
lative hazard functions. Our estimation procedure consists of the following 
steps. 

Step 1. We propose two working models for both the lifetime T and the 
censoring time C given L. Our working models for T given L and C given 
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L are Cox's proportional hazard models; that is, we tentatively assume that 

h T \dy\l) = \ T { y y\ h c{L (y\l) = \ c (yW l 
for some unknown functions Xt('), ^c(') an d some parameters (/3,7). 

Step 2. We derive the estimator of (/3,7) simply by performing Cox's 
regressions, or equivalently, we maximize the following pseudolog partial 
likelihood functions: 



1 n 



4 n) (7) = ^Eu-^ 



i=l 
n 



/3%-log Yl e 



n : 
i=i 



7% -log Yj e 



to estimate f3 and 7, respectively. We denote the estimators as (/3 n >7n)- It 
will be shown in the next section that there exist two constants (3* and 7* 
such that f3 n and ^ n converge to (3* and 7* in probability, respectively. 

Step 3. Acting as if the two limit constants (3* and 7* were known, we 
obtain the estimator of the hazard rate function of T given (/?*'L,7*'L) 
as follows. Denote Z* = (/?*' L,7*' L). When either of the working mod- 
els is right, it will be shown that T and C are independent given Z* in 
Lemma 3.1. In other words, the two-dimensional covariate Z* is sufficient to 
explain the dependence between T and C. Therefore, we replace the covari- 
ates L by Z* in the observations and obtain a reduced dataset (Y{ , Ri , Z* = 
({3*' 'Li,^y*' Li)),i = 1, ... ,n. Clearly, the likelihood function for this reduced 
data can be verified to be 

n 

1=1 

where h T \ z *(-\Z*), h c \ z * are the hazard rate functions of T and C 

given Z* , respectively, and H T \ Z *(-\Z*),H C \ [Z *(-\Z*) are their corresponding 
cumulative hazard functions. So we can estimate hj>\ z * (y\z) by maximizing 
a local version of the observed log-likelihood function 

J2 K [ Ri bg h T \ z , (Y t I z) - H T \ Z * (Y t I z)} , 



=1 



a, 



where K(-,-) is a symmetric two-dimensional kernel function and Q"Yi IS 3j 
bandwidth to be chosen later. Easy calculation shows that the maximizer 
for hx\z*(y\ z ) is an empirical function with a point mass at each observed 

Yj and the mass is equal to -Rji^(%- ! )/(Ey m >y ? K i^tr))- 
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Step 4. Therefore, the estimator for the cumulative hazard function is 
given by 

H T \ z .(y\z)= V R,K((Z*-z)/a n ) 

Yj < y ^Y m >Yj K{{Zm ~ z )l a n) 

The estimator for the conditional survival function of T given Z* is then 
St\z*(A z ) = l~L<t(l ~~ Ht\z*({ s }\ z ))- Finally, the estimator for the marginal 
survival function of T is simply the empirical average of Sx\z*(t\ z ) over an 
the Z* , i = 1, . . . , n. That is, it is equal to 

i VriYl K{{Z*-Z*)/a n )I Yj < t R 3 \ 
n ^ / = \ ^ Em=i ^((^ " ^)/on)^<K„ ) ' 

S'tep 5. Since the two constants (3* and 7* are unknown but can be 
consistently estimated by f5 n and j n , we replace (/?*,7*) with (f3 n ,7n) in 
the last estimator obtained in Step 4. Thus, we obtain an estimator for the 
survival function of T as 

s»m= if: nfi ^ 



n . 

1=1 3=1 



YL^KHZi-Z^/a^lY^ 



3. Main results. Before we present the main results of this article, we 
assume the following conditions hold. 

Assumption 3.1. T and C are independent conditional on L. 

Assumption 3.2. Let r be the ending time of the study. For any I in the 
support of L, the conditional density of (T, C) given L = I is continuously 
twice-differentiable in [0, oo) x [0, r) and its second derivatives are uniformly 
bounded. Moreover, L has bounded second derivative in its support. 

Assumption 3.3. There exists an unknown constant 9 such that for 
any / in the support of L, 

MP(T>t\L = 1) >9>0, 

inf P(C > t\L = l) = inf P(C = T \L = l)>9>0 a.s. 

Assumption 3.4. The kernel function K(x\,X2) is continuously twice 
differentiable with bounded second derivatives. Moreover, it satisfies 

K(-x 1 ,-x 2 ) =K(x 1 ,x 2 ), 

\V Xj K{ Xl ,x 2 )\ < 3 = 1,2. 

J 1 + xf + x 2 
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Assumption 3.5. (log T ->0,na2->oo,na£->0. 

Remark 3.1. Assumption 3.3 implies that all the subjects surviving 
until r will be right-censored at r, due to the end of the study. In Assumption 
3.4, an example of kernel functions satisfying the conditions is k{x\,X2) = 
exp{— (x\ + ^2)} or an y symmetric smooth function with bounded support. 
The conditions in Assumption 3.5 stipulate the choice of the bandwidth and 
control the asymptotic bias of S n (t) resulting from the kernel estimation. 
First, based on Dabrowska (1987), (log a n ) 2 /(na^) — ► ensures the unform 
convergence of Hx\z*if\ z )i a type of kernel estimator for the cumulative 
hazard function. Second, it is known that for a kernel smoothing estimator 
with bandwidth a n in the two-dimensional real space, the convergence rate 
is of the order V na\ and the bias is of the order a\. Such bias carries into 
the estimator S n (t). Thus, na^ — > in Assumption 3.5 ensures that the 
asymptotic bias of \/n(S n (t) — So(t)) resulting from the kernel estimation 
will be zero. Clearly, one choice of the bandwidth a n in Assumption 3.5 
can be 0(l)n~ a where a £ (\,^) and we will use a n = 0(n~ 1 ^) in the 
subsequent simulation study. 

3.1. Asymptotic properties of j3 n and j n . 

Theorem 3.1. Under Assumptions 3.1-3.5, there exist (3* and 7* such 
that 

1 n 

Vn~([3 n -p*) = —J2 S p {(3\Y tl R h U) + 0p (l), 
v n i= i 

1 n 

Vn(in -7*) = ^^2S~ f {j*,Y i ,R i ,L i ) + o p {l) 
v n i=l 

for some influence functions Sp and 5 7 . Thus, both \/n(p n — j3* ) and \/w(7n — 
7*) converge weakly to some multinormal distributions. 

Theorem 3.1 shows that (/3 n ,7n) converges to some constants even though 
using Cox's proportional hazard models as working models may be wrong. 
Obviously, if the model of T given L is a Cox's proportional hazard model, 
then P* is the correct coefficient of L specified in this model; if the model 
of C given L is a Cox's proportional hazard model, then 7* is the correct 
coefficient of L specified in this model. Furthermore, we show that, when 
either working model is correct, the condensed variables (/?* L, J*'L) are 
sufficient to explain the dependence between the lifetime and the censoring 
time. 
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Lemma 3.1. Suppose either of the working models is right, that is, either 
the model for T given L is a Cox'' ' s proportional hazard model or the model 
for C given L is a Cox's proportional hazard model. Let Z* = (P*'L, J*'L). 
Then T and C are independent given Z* , and moreover, the cumulative 
hazard function of T given Z* is equal to Jq — P ^(xac>v]z*=z)~ Z ^ • 

Proof. We only show that the results are true if the working model for 
C given L is a Cox's proportional hazard model. For any t±,t 2 > 0, 

P(T<h,C< t 2 \Z*) = E L \ Z , [P(T < h\L)P(C < t 2 \L)} 

= E L \ z ,[P{T<t 1 \L)P(C<t 2 \Z*)} 

= P{C <t 2 \Z*)P(T <h\Z*). 

Therefore, T and C are independent conditional on Z* . Hence, 

r* d u P(T AC<u,R = 1\Z* = z) 
P(TAC>u\Z* = z) 

' I Su\P(c > T\Z* = z)- P(u >T\Z*= zj dF c (c) 
o P{C>u,T>u\Z* =z) 

1 d u P(T <u\Z* = z) 



o P{T>u\Z* =z) 



H T \z*(t\z). n 



3.2. Asymptotic properties of the estimator S n (t) . The main result is the 
asymptotic property for S n (t) given below. 

Theorem 3.2. Under Assumptions 3.1-3.5, if either of the two working 
models is correct, that is, either the model for T given L is a Cox's propor- 
tional hazard model or the model for C given L is a Cox's proportional 
hazard model, 

Vn~(S n (t)-S(t))^G(t) mi°°([0,r]), 
where G(-) is a Gaussian process. 

Remark 3.2. Indeed, the covariance of G(-) has an explicit form. From 
the proof of Theorem 3.2, S n (t) is an asymptotic linear estimator of S(t) 
and its influence function, denoted as A(t;Y,R,L), is equal to 

e -H T \ z *(t\Z*) _ S ( t j _ R j Y<te H Tlz »(Y\Z*)+H clz *(Y\Z*)-H Tlz *(t\Z*) 
r tAY 

(3.1) + / e HT\z<u\Z*) + H c \ z *{u\Z*)-H nz ,(t\Z*) duR (u\Z*) 

Jo 

+ B 1 (t;Y,R,L) + B 2 (t;Y,R,L), 
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where B\ (i; Y, R, L) is 



-E 



-H nz ,{t\Z*) v 



717=7 



xs 7 (f,y,i?,L) 

and B 2 (t;Y,R,L) is 



-H T | Z .(t|Z*) v 



d u P(Y A C < m, R = 1|VL, ff-'L) 
P(TAC>u\t'L,P*'L) 



d u P(T AC<u,R = 13' L) 

P{T AC>u\^*'L,p'L) 



xSp(P*,Y,R,L). 



Therefore, the covariance function, denoted by r(s,t), for the limit Gaussian 
process is equal to Cov(.4(s; Y, R, L),A(t; Y, R, L)). Interestingly, the covari- 
ance of the limiting process G(-) does not depend on the choice of the kernel 
function or the choice of the bandwidth in deriving the estimator S n (t). 

In the expression of (3.1), the two terms B\(t;Y,R,L) and B2(t;Y, R, L) 
contribute to the variation in estimating S n (t) due to the estimation of 
f3* and 7*. Moreover, if the working model of T given L is correct, by 
repeating the arguments in proving Lemma 3.1, we easily obtain that for 

any 7, 



d u P{T AC<u,R = 1| 7 'L, 0*'L) 



H T \^'L,yL(t\(3*'L,j'L). 



10 P(T AC>u\-f'L,l3*'L) 
Therefore, 

* d u P(T A C < u, R = l\(3*'L, 1L) 



-E 



e -H T]z *(t\Z*) v 



717=7 



V 1 \ J=r Ee~ HT W*' L "' L 



P(T AC>u\/3*'L,j'L) 

(t\/3*'L,y'L) 



V 



717=7' 



S(t)=0. 



Hence, we conclude that £>i(i; Y, R, L) is zero. Similarly, B 2 {t; Y, R, L) is zero 
if the working model for C given L is correct. 

Corollary 3.1. In the expression of (3.1), if the working model for T 
given L is correct, B\(t;Y, R, L) = 0; if the working model for C given L is 
correct, B 2 (t;Y,R,L) = 0. 

As a result, when both working models are correct, B±(t; Y, R, L) = B 2 (t; Y, R, L) 
and moreover, H T]z *(t\Z*) = H T \ L {t\L), H c \ z *(t\Z*) = H c \ L (t\L). Hence, 

simple calculation gives that the influence function in (3.1) for S n (t) is equal 
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to 



S TlL (t\L)-S(t) + 



R(I(T>t)-S(t)) 



Sr 



+ / E 



c\l(T\L) 
R(I(T>t)-S(t)) 



Sc\dT\L) 



L,T > u,C > u 



dM c (u), 



where dM c (u) = (1 — R) dI(Y <u)- I(Y > u) dH c \ L (u\L) is the martingale 
process for the censoring time. This turns out to be the efficient influence 
function for S{t) in the full model space, which was derived in an unpublished 
manuscript by Gill, van der Laan and Robins (1997). Consequently, we have 
obtained the following corollary. 

Corollary 3.2. When both working models are correct, the asymptotic 
variance of S n (t) is the same as the generalized Cramer-Rao bound for S (t) . 

3.3. Variance estimation for estimating a Frechet differentiable functional 
of S(t). In survival analysis, practical interest may include the estima- 
tion of some functional of S(t), such as the survival probability at a fixed 
time to, the observed mean lifetime E[T\T <t], and median lifetime, and 
so on. Denote such a functional of S(t) as ^(S(t)). Then we can estimate 
it with ^(S n (t)). Furthermore, if ^f(-) is Frechet differentiable with its first 
derivative along direction S n (t) — S(t) given by fQ(S n (t) — S(t))dip(t) for 
a bounded variation function tp, then the functional delta theorem con- 
cludes that - v /n(^ / (5 n (t)) — ^(S(t))) has an asymptotic normal distribution 
with mean zero and variance a 2 = J Q r J r r(s,t) d^(s) dip(t), where r(s,t) = 
E[A(s; Y, R, L)A(t; Y, R, L)]. In this section we want to give a general pro- 
cedure for estimating a 2 . 

Denote V n as the empirical measure of the i.i.d. observations (Yj,-R,,Lj), 
i = 1, ... ,n. Clearly, one consistent estimator of a 2 is given by 



P n [A(t; Y, R, L)A(t; Y, R, L)] difj(s) d^(t) 



o Jo 



in which A(t; Y, R, L) is a consistent estimator of A(t; Y, R, L). To obtain 
A(t; Y, R, L), we estimate each term in the expression (3.1) separately. 

First, in (3.1), we substitute H T \ z *(t\Z*) and H c \ z *{t\Z*) with their cor- 
responding estimators HT\z*(t\Z) and Hc\z*{t\Z) following Step 3 of Sec- 
tion 2; furthermore, according to the proof of Theorem 3.1, we can consis- 
tently estimate the influence functions for (3 n and j n , by S(f3 n ,Y, R, L) and 
S(^ n ,Y, R, L), respectively. Specifically, Sp([3 n ,y,r,l) is 



R 



P n [I Y >y'LL'e^ L ] P n [/ y > I/ ,Le% 



Pn[lY>y>e^ L ] P n [I Y >y'e^ L } 2 
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(3.2) 



Pn[Iy< Y e^ L } 

+ e^P r 



RI Y < 



¥ n [lY<yie^ L \\ yl=Y 

RlY<yPn[lY<yiLe^ L ]\ y , = Y~ 



P n [lY<y>e^ L } 2 \ y < =Y 



and S^ n ,y,r,l) is 

-p n [I Y > y >LL'e^ L } P n [I Y > y >Le^ L r 2 



(1-R) 



P ri [Iy> 2/ 'e% 



P n [/y>j,'e% L ] 2 



(3.3) x<(l-r)f-(l-r) ^ y ^ -^P, 



P„[/j,<ye%i] 



</=yJ 
(l-12)Iy< w 



Pn[lY<y'e^ L ]\y'=Y 



+ e^P 



(1 - R)I Y < y P n [I Y < y ,Le^%, =Y 



Pn[lY<y>e^ L } 2 \yi =Y 



Additionally, we can estimate 



(3.4) — E 
and 

(3.5) — E 



7 1 7=7 



d u P(T A C < u, R = l\iL, p*'L) 



e -H T \z*(t\Z*) v 



P{T AC>u\j'L,f3*'L) 
t d u P(T A C < u, R = 1|7*% (3'L) 



P(TAC>!i|f'i,/5'L) 



using the following lemma. 



Lemma 3.2. For any constants (f3,^), we define an estimator of S(t), 
denoted by S n {t;(3,~f) by repeating Steps 1-4 in Section 2 for fixed (3 and 7. 
Let ei,...,ej~ be the canonical bases in i? dim (^*) ; that is, has 1 at the 
ith position and 0's elsewhere. Similarly, let d\, . . . ,d\ be the canonical bases 
in R dira ("f*\ Moreover, we select a constant e n such that e n = o(a n ), y/ne n — > 
00. Then when one of the working models is correct, the two statistics, de- 
fined by 



(3.6) 



and 



(3.7) 



V* 



1 



% + e n di) - S n {t)\ 
\S n (t;f3 n ,% + e n di) - S n (t) J 

Sn{t'i fin ~\~ £n&l j 7n )-Sn(t)\ 



£-n&k j 7ra ) " Suit) ) 

are consistent estimators of (3.4) and (3.5), respectively. 
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So finally, one consistent estimator for A(t;Y, R, L) is given by 

e -H nz ,(t\Z) _ _ RlY<te H Tlz «(Y\Z)+H clz *(Y\Z)-H T}z ,(t\Z) 

+ f tAY e *T lZ * (u\Z)+H clz , {u\Z)-H nz , (t|Z) du H mz) 

Jo 

+ %Ji(if n ,Y, R, L) + V Jp0 n ,Y, R, L). 

Remark 3.3. The numerical method for estimating (3.4) and (3.5) is 
much more convenient for implementation, compared with the direct esti- 
mation of the conditional probabilities in these two expressions. When the 
bandwidth a n has order n -1 / 3 , one choice of e n may be of the order n _5//12 . 
Computationally, except that the final evaluation of the variance requires a 
numerical double integration, the computing time in the other steps is only 
a linear order of the computing time for computing S n (t), which is about 
0(n 3 a 2 ). The storage in the computation is the same order as storing an 
n x n numerical array. 

Remark 3.4. As a special example, the asymptotic variance for y/n(S n (to) — 
S(to)) can be approximated by P n [A(to; Y, R, L)] 2 for any fixed time to £ 
[0,r]. 

4. Simulation study. We have performed a simulation study to show the 
advantages of our approach in small samples. In the simulation, three covari- 
ates, denoted as X±,X2,Xs, were independently generated from the uniform 
distribution between and 1. The lifetime T was generated from Cox's pro- 
portional hazard model, whose hazard rate function had the following form: 

hs rlx (t\X)=\ T (t)exv{p 1 X 1 +p 2 X 2 +/3 3 X 3 + p 12 X 1 X 2 +[3 13 X 1 X 3 +/3 23 X 2 X 3 }. 

The values of the parameters in the simulation were taken to be j3\ = — 1, 
fo =4,(3 3 = 3, (3 12 = 0, 013 = 6, p 2 3 = 10, Ar(t) = i 4 e" 5 . The censoring time 
C was the minimum of r = 2 and C* , where C* was produced using Cox's 
proportional hazard model with the hazard rate function given by 

h c \ x {t\X) = Ac(t)exp{7iXi +72X2+73X3+712X1X2+713X1X3+723X2X3}. 

We chose the parameters as 71 = 1, 72 = 1, 73 = 1, 712 = 0, 713 = 5, 723 = 10, 
Xc(t) =t 4 e -4,5 . The choice of the parameter values demonstrated that the 
dependent censoring between T and C was significant (theoretically, the 
marginal correlation between T and C was around 75%) and the censoring 
proportion was not too low (the theoretical censoring probability for this 
setting is 45%). 

We followed the procedure in Section 2 to estimate the survival function 
for T with the kernel function k(x\, x 2 ) = exp{ — (x\ + x?,)} but started with 
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different working models for T and C given (Xi,X2,Xs). Especially, if we 
denoted X as (X±,X2,X^) and denoted X 2 as their two-way interactions, 
six pairs of working models could be considered: 

Pair 1. We modelled both T and C using all the main effects X and the two- 
way interactions X 2 as well as an independent variable Z, which was 
generated from the uniform distribution between and 1. 

Pair 2. We modelled both T and C using all the main effects X and the 
two-way interactions X 2 . 

Pair 3. We modelled T using X and X 2 ; however, we modelled C using only 
the main effects X. So we misspecified the model for C. 

Pair 4. We modelled C using X and X 2 ; however, we modelled T using only 
the main effects X. So we misspecified the model for T. 

Pair 5. We modelled both T and C using only the main effects X. That is, 
we misspecified both models. 

Pair 6. We did not account for any covariates and the Kaplan-Meier esti- 
mate was used to estimate the survival function. 

By comparison of the bias and variation among the above six pairs of 
working models, we expected to verify that the estimates accounting for 
dependent censoring using covariates in the estimation always perform bet- 
ter than the Kaplan-Meier estimate, that including an irrelevant variable 
does not bias the estimate and that double robustness is evidenced in small 
samples. 

Moreover, we studied how the estimates varied with the different choices 
of the sample size n, the bandwidth a n and the oscillation parameter e n . We 
thus generated data with sample size n = 50 or n = 100. For each generated 
sample, we used varied bandwidths a n = ra" 1 / 3 , 3n -1 / 3 , 6n -1 / 3 to calculate 
the estimates. In addition, we used different choices e n = ra~ 5 / 12 , 5re~ 5 / 12 , 10n~ 5 / 12 
to calculate the standard errors and the coverage probabilities in estimating 
the survival probabilities for t = r/5,2r/5. Such computation was repeated 
500 times. 

For both n = 50 and n = 100, the average censoring proportion was about 45% 
and the marginal correlation between T and C was 77% in the simulated 
samples. Tables 1 and 2 report our findings. In Table 1 we give the average 
mean square error of the estimates on 50 grid points, which is defined as 

In Table 2 we report the average bias and the 95% confidence interval cov- 
erage probabilities for estimating the survival probabilities at times r/5 and 
2t/5. Since it has been found that the coverage probabilities vary very little 
when e n varies in our choices, we only report the results for e n = ?i~ 5 / 12 . 
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From Table 1, it is clear that the Kaplan-Meier estimates have the largest 
mean square error and the estimates adjusting for dependent censoring using 
covariates can reduce it by 50% for sample size 50 and by over 60% for sample 
size 100. Moreover, using the irrelevant covariate Z in the regression models 
does not increase the mean square error, and when either of the regression 
models is correct (i.e., both the main effects and the two-way interactions 
among X\,Xi,X-$ are used in the regression), the mean square errors are, 
on average, 10% less than for the case which only uses the main effects in 
both regressions. The mean square errors of the estimates are fairly robust 
to the choice of the bandwidth. The results displayed in Table 2 further 
evidence the above findings from the view of the point estimates of S(t) and 
the corresponding coverage probabilities. Table 2 shows that when either re- 
gression model is specified correctly, the biases in the estimates are less than 
for the cases when both models are misspecified; the Kaplan-Meier induces 
the largest biases. Overall, these biases decrease by 50% when the sample 
size increases from 50 to 100. With the sample size 50 or 100, the coverage 
probabilities using the methods proposed in Section 3 are fairly accurate for 
t = t/5 when either regression model is specified correctly; however, they 
tend to be smaller for t = 2r/5 due to the larger bias caused by high cen- 
soring at the tail. When the bandwidth is large (for instance, a n = 6re -1 / 3 ), 
the biases increase due to oversmoothing, but the coverage probabilities do 
not vary much. 

Our simulation study indicates that the estimates of the survival func- 
tion by adjusting for dependent censoring using auxiliary covariates always 

Table 1 

Mean square error from 500 samples 



MSE(xlO- 3 ) MSE(xlO" 3 ) MSE(xlO -3 ) 



n model T 


model C 


a„ = n _1/3 


a„ = 3n~ 1/3 


a„ = 6n~ 1/3 


50 (X,X 2 ,Z) 


(X,X 2 ,Z) 
(X,X 2 ) 


7.2 


6.9 


6.8 


(X,X 2 ) 


7.2 


6.8 


6.8 


(X,X 2 ) 


(X)' 
(X,X 2 ) 


7.0 


6.7 


6.7 


(X) a 


7.1 


6.9 


7.0 


(X) a 

(-) b 


(X) a 


7.7 


7.4 


7.5 


R b 


17.4 


17.4 


17.4 


100 (X,X 2 ,Z) 
(X,X 2 ) 


(X,X 2 ,Z) 
(X,X 2 ) 


3.5 


3.4 


3.3 


3.5 


3.4 


3.3 


(X,X 2 ) 


(X) a 


3.4 


3.3 


3.3 


(X) a 


(X,X 2 ) 


3.7 


3.5 


3.5 


(xr 


(X) a 

H b 


4.3 


4.0 


4.0 


(-) b 


13.0 


13.0 


13.0 



Notation. (• ■ -) a : model is misspecified; ( — ) b : the Kaplan-Meier estimate is used. 
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Table 2 

Estimate of the survival probability at times t — r/5 and t = 2r/5 from 500 samples with 

F __ = „-5/12 



S(r/5) 

2^5 



5(2t/5) 



a„ model T model C bias(xlO"^) 95% cp bias(xlO _:i ) 95% cp 



50 



-1/3 



en- 1 / 3 



100 



-1/3 



(in 



-1/3 



(X,X 2 ,Z) 
(X,X 2 ) 
(X,X 2 ) 

(X) a 

(X) a 

R b 

(X,X 2 ,Z) 
(X,X 2 ) 
(X,X 2 ) 

(X) a 

(X) a 

R b 

(X,X 2 ,Z) 
(X,X 2 ) 
(X,X 2 ) 

(X) a 

(X) a 

R b 

(X,X 2 ,Z) 
(X,X 2 ) 
(X,X 2 ) 

(X) a 

(X) a 

R b 

(X,X 2 ,Z) 
(X,X 2 ) 
(X,X 2 ) 

(X) a 

(X) a 

R b 

(X,X 2 ,Z) 
(X,X 2 ) 
(X,X 2 ) 

(X) a 

(X) a 

R b 



(X,X 2 ,Z) 
(X,X 2 ) 

(X) a 
(X,X 2 ) 
(X) a 

R b 

(X,X 2 ,Z) 
(X,X 2 ) 

(X) a 
(X,X 2 ) 
(X) a 

R b 

(X,X 2 ,Z) 
(X,X 2 ) 

(X) a 
(X,X 2 ) 
(X) a 

R b 

(X,X 2 ,Z) 
(X,X 2 ) 

(X) a 
(X,X 2 ) 
(X) a 

R b 

(X,X 2 ,Z) 
(X,X 2 ) 

(X) a 
(X,X 2 ) 

(X) a 

R b 

(X,X 2 ,Z) 
(X,X 2 ) 

(X) a 
(X,X 2 ) 

(X) a 



0.94 


0.94 


1.97 


0.92 


0.94 


0.94 


1.95 


0.92 


0.96 


0.94 


2.03 


0.93 


l.UO 


u.yo 


9 9 1 } 
z . zo 


n Q9 


1.22 


0.93 


2.99 


0.92 


5.69 


0.87 


10.72 


0.70 


0.94 


0.94 


1.95 


0.91 


0.96 


0.94 


2.02 


0.91 


1.00 


0.93 


2.12 


0.90 


l.Uo 


n Q9 


9 9^ 
z . zo 


n so 
u.oy 


1.46 


0.92 


3.02 


0.87 


5.69 


0.87 


10.72 


0.70 


1.30 


0.93 


2.50 


0.90 


1.33 


0.93 


2.58 


0.90 


1.50 


0.93 


2.72 


0.89 


1.54 


0.92 


2.76 


0.89 


2.16 


0.92 


3.58 


0.87 


5.69 


0.87 


10.72 


0.70 


0.44 


0.94 


1.37 


0.96 


0.44 


0.94 


1.33 


0.95 


0.48 


0.93 


1.36 


0.94 


0.52 


0.92 


1.51 


0.92 


0.95 


0.93 


2.73 


0.89 


5.44 


0.75 


10.50 


0.51 


0.45 


0.92 


1.46 


0.93 


0.48 


0.93 


1.42 


0.92 


0.57 


0.92 


1.48 


0.92 


0.55 


0.91 


1.52 


0.91 


1.13 


0.90 


2.83 


0.89 


5.44 


0.75 


10.50 


0.51 


0.79 


0.92 


1.89 


0.91 


0.82 


0.92 


1.89 


0.92 


0.99 


0.93 


2.09 


0.91 


0.96 


0.91 


1.98 


0.91 


1.67 


0.90 


3.23 


0.89 


5.44 


0.75 


10.50 


0.51 



Notation. (• 



model is misspecified; (— ) b : the Kaplan-Meier estimate is used. 
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induce smaller mean square errors, fewer biases and more accurate coverage 
probabilities compared with the Kaplan-Meier estimates. Moreover, the es- 
timates have better performance when either the model for T or the model 
for C given the covariates is used correctly. The overall mean square errors of 
the consistent estimates are fairly robust to the choice of the bandwidth; but 
the point estimates and the inference vary with the choices of the bandwidth 
and the location of time points. 

5. Discussion. Both our theoretical justification of large samples and 
simulation studies with small samples conclude that, when right-censored 
data include high-dimensional auxiliary covariates, condensing such infor- 
mation by utilizing working models for both lifetime and censoring time 
given covariates can make adjusting for dependent censoring possible and 
produce an estimator which is robust to the misspecification of either work- 
ing model and robust to accidentally using irrelevant information. 

It is observed in our simulations that the choice of the bandwidth a n plays 
an important role in influencing the bias and the inference for the point es- 
timate. A large a n may oversmooth the conditional hazard rate estimator 
(in fact, with simple calculation, for fixed n, if a n is close to infinity, our 
estimate approximates the Kaplan-Meier estimate), while a small a n may 
overfit the conditional hazard rate estimator, and thus introduce large vari- 
ation in estimation. So far, we let a n be a constant only depending on n and 
no general selection rule is followed; however, the simulation results imply 
that a data-adaptive and location-adaptive a n may give a better performing 
estimate. The cross-validation approach may be used to choose a n or we can 
use the k nearest neighbor approach in nonpar ametric hazard regression. 
We will explore this issue more in the future. 

Though we hope that our working models are correct, we never know 
in reality. To make this hope more likely, we may use more general models 
other than Cox proportional hazard models as working models, for example, 
we can use a generalized additive model or use splines as covariates in the 
working models, and so on. A model selection rule is thus useful in choosing 
the optimal working models in terms of the performance of the estimates 
and the model complexity. Therefore, a test for goodness of fit as well as 
a test for comparing two different sets of working models will be useful in 
practice. 

Finally, when L includes the time-dependent covariates, our approach is 
not obvious to fit this situation. This is because the condensed information 
{(5'L,^f'L) is still time-dependent so their dimension is infinite; then an es- 
sential problem is how to derive a nonparametric estimate of the marginal 
survival function in the presence of even a single time-dependent covariate. 
Further exploration of this issue is ongoing. 
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Proof of Theorem 3.1. We consider only the estimator (5 n in the 
following. The argument for the estimator 7„ is similar. Obviously, (3 n max- 
imizes 



i=l i=l \Y;>Yi i 



Note that L\{j3) is a concave function of and its limit, which is equal 
to Li{(3) = P[Rf3'L — R\ogE[lY> y eP' L ]\ y= Y]i is a strictly concave function. 
By an argument similar to that in Andersen and Gill (1982), we obtain that 
with probability 1, $ n converges to the unique maximum of L\{(5), denoted 
by/?*. 

After the linearization of the equation L^(/3 n ) = around f3* , we obtain 
that 

- 0*) = v^(Pn - P)S({3*,Y,R,L) + 0p (l), 
where the influence function S{(3* ,y,r,l) is equal to 



-{vj^ZiOr)}- 1 ! 



rl 



P\l 



Le?" L ] 



P[I y < Y eP*' L ] 



le 



RIy<y 



P[I Y < y >eP" L ] 



y' = Y 



+ e 



RI Y <yP[lY<y>LeP*' L )\ yl= Y 

P[lY<y>ef' L ]*\y>=Y 



□ 



Proof of Theorem 3.2. Recall Z* = (/3*'L,j*'L) and Z = 0' n L^' n L). 
We assume one of the working models is correct so T and C are independent 
given Z* from Lemma 3.1. The whole proof consists of three steps: In the 
first step, we show the uniform consistency of H T \ z *(t\z), thus S n (t); then 

we write ^/n{S n {t) — S(t)) as a linear functional of the empirical processes; 
in the third step, we apply empirical process theory to obtain the asymptotic 
properties. 

First, the following result holds and the proof is given in Dabrowska 
(1987). 



Lemma A.l. For any z in the support of Z* , 
V n [K{{Z*-z)/a n )I Y >t] 



P n [K{(Z*-z)/a n )\ 
P n [K((Z*-z)/a n )I Y <tR] 



P(Y>t\Z* =z) 



P n [K{(Z* -z)/a n )\ 



P{Y <t,R=l\Z* 



i°°([o.t]) 



1°°([0,t]) 



0. 



0. 
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Lemma A. 2. For any z in the support of Z* , 
V n [K((Z-z)/a n )I Y >t] 



P n [K((Z-z)/a n )] 
P n [K((Z-z)/a n )I Y <tR] 



P(Y>t\Z* =z) 



P n [K((Z-z)/a n )} 



P(Y <t,R = l\Z* =z] 



'°°([0,r]) 



i°°([0,r]) 



0, 



0. 



Proof. For convenience, we denote 



1 



9niP,i) = ~Y P n 



K 



{pL,jL)-z 



We show that sup te[0 r] \g n {Pn,%) ~ 9n(P*,l*)\ a.s. By the property of 
the kernel function and the mean value theorem, we have that 



\g n 0n,Jn) -9n{P*,l*)\ 



1 " 
<i-V 
nal ~— \ 

n l= i 



VK 



J (fifLtfL)-z 



O 



\Pn-P*\ { |7n~7*l 



a„ 



where (/?, 7) is between (/3 n >7n) and (/?*,7*). Hence, for any 2= (21,2:2) hi 
the support of Z*, 



|5n(/3n,7n) ~ 9n{P* 
1 



<0 P 



na n 
1 



1 

— F- 

^ ^ l 



+ (/?'L l -zi) 2 /a2+(fL,-z 2 ) 2 /a2j 



na r 



nal ^ 1 



4 ^ 1 + (£*% - ^i) 2 K + (7*% - * 2 ) 2 KJ ' 

where the last step follows because |V X , log(l + x\ + £c|) | , j = 1,2, is uni- 
formly bounded and l/j ' L ; r ' L| + iyL ~ 7 ' >/L| < O p (l). Notice that 



all + {p*'L-z l f/al + in*'L-z 2 f/al 



is uniformly bounded. So sup 4g[0T] \g n 0n,%) ~ 9n{P* < °p(^/k^)- 
Similarly, we can obtain that 



sup 

te[o,T] 



n i= i 



Z-z 



-tE^ — — tE^ 



71 j = l 



•0. 



Combining this result with Lemma A.l, it is clear the first half of Lemma A. 2 
holds. The second half of Lemma A. 2 can be proved similarly. □ 
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Lemma A. 3. Denote 
H T \ z *(t\z 

and 



d s P n [K((Z-z)/a n )RI Y < i 



o P n [K((Z-z)/a n )I Y > t 
ST\z*(t\z) = T[(l-H Tlz «({s}\z)). 



s<t 



Then for any z in the support of Z* , in probability \\Hx\z*(t\ z ) ~ HT\z*(t\ 
z )\\i°°([o,t]) and \\£>T\z*(t\z) ~ S T \z*(t\ z )\\i™([o,T]) ->0. 

Proof. The first result follows from Assumption 3.3, Lemma A. 2 and 
the following inequality: 

\\H T \ z *(t\z) - H T \ z *(t\z)\\ lOB( jQ iT ^ 

'* d s P n [K((Z - z)/a n )RI Y < s ] [* d s E[RI Y < s \Z* 



< 



o P n [K((Z-z)/a n )I Y > : 
P n [K((Z-z)/a n )I Y < t R] 



o 



P n [K((Z-z)/a n )} 

P n [K((Z-z)/a n )I Y > t ] 



E[I Y > S \Z*] 
P(Y <t,R = l\Z* = z] 



I«([0,t]) 
2°°([0,t]) 



x <, mm 



+ 



P n [ J PC((Z-z)/a n )] 
P n [^((Z-z)/a n )/ y > 4 ] 



P„[^((Z-z)/a ri )] 

P n [K{{Z-z)/a n )I Y > t 



P(Y >t\Z* = z) 



Z°°([0,t]) 



x < mm 

J6[0,r 



P n [K{{Z-z)/a n )] 



P(Y>t\Z* = z) 



□ 



For the second result, we use the Duhamel equation and integration by 
parts: for any t € [0, r] , 



\S T \ z *(t\z) - S T \ z *(t\z)\ 
= S T \ z *(t\z) 



1 S T \z*{u-\z) 



o S T \ Z *{u\z) 
S T \z*{t-\z){H T \z*{t\z)-H T \ z *{A z )) 
{H T \z*{u\z) - H T \ z *(u\z)) 



d{H T \ z *{u\z) -H T \ z «(u\z)) 
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< 1 + 



V S T \ z *(u\z) S T \ z *{u\zf 
2 

min P(T > t\Z* = zf ) d<S<t 



max JiJ T | Z *(s|z) - 



For the second step, we will write Ht\z*{A z ) ~ ^T\z* (t\ z )-> then S n (t) 
S(t) in terms of the empirical process (P77, — P)* First, W6 obtain 

H T]z *(t\z) - H T]z *(t\z) 

l/a 2 n K((Z-z)/a n )I Y < t R 



(Pr 



P n [l/a2K((Z - 2)/a n )/y>„]| y= y. 

l/a^((Z - z)/a n )I Y < t R(P n - P)[l/a 2 n K((Z - z)/a n )I Y > y ]\ y=Y 
P n [l/a 2 n K((Z-z)/a n )I Y > y } 2 \ y=Y 



+ P 



- l/a 2 n K((Z - z)/a n )I Y <tR 
.P[l/a 2 K((Z-z)/a n )I Y > y ]\ y=Y 

I + 11 + III. 



H T \z*(t\z] 



For a simple transformation in the integral gives that uniformly in z in 
the support of Z* and t G [0, r] , 



III 



* d u P(R = l,Y <u\Z 



+ O p {al)-H T \ z *(t\z). 



P(Y>u\Z = z) 

On the other hand, since by Lemma 3.1 /q duP jf(y>u\z*=z) ~^ = ^t\z*{A z )i 
we perform the Taylor expansion of the above expansion around (/?*,7*), 
and then III becomes 



III = Vp\p =p * 



+ v 



717=7 



d u P{Y <u,R = l\((5'L, 1 *'L) = 
P{Y>u\{(3'L,~(*'L) = z) 

1 d u P(Y <u,R = l\ (f3*'L, iV) = z 



P(Y>u\{p*'L,iL)=z) 



(7n-7*) 



+ O p {a 2 n ) + O l 



For convenience, we introduce more notation: 

hUy,r,l;^,t,z)- UalK( m ,il)-z)/a n )I y< _ t r 



P n [l/a 2 K((((3'L,iL) - z)/a n )I Y > y Y 



h2(y,l;P,i,t,z) 



-K 



20 



x P 



D. ZENG 
,2 



l/a 2 n K{({(5'L,iL) - z)/a n )I Y <tlY< y R 



¥ n [l/alK{{{f3'L,iL) - z)/a n )I Y >y} 2 \ y =Y 



B(j3,~f,z,t) 



d u P(Y<u,R = l\{(3'L,iL) = z) 
P(Y>u\(p'L,j'L) = z) 



After substituting this notation into the expression H T \ z *(t\z) — H T \ z *(t\z), 

then further substituting into St\z*{A z ) ~ St\z*(A z ) m the Duhamel equa- 
tion, we have that, uniformly in t 6 [0, r], 



S T \ z *(t\z) - S T \ z *(t\z) 
= -S(t\z)l(P n -P) 



1 S T \z*{u-\z) 
o S T \ z *(u\z) 



dh%(Y,R,L;/3 n ,%,u,z) 



(Pn-P 



(A.l) 



S T \ z ,(u - \z) 



+ 



+ 



o S T \ z *(u\z) 

4 S T \ z *{u- \z) 
S T \ z *(u\z) 

* S T \ z *{u- \z) 



dh%(Y,L;(3 n ,%,u,z) 
d u X7pB(f3*, 7 *,z,u) n -/3*) 
d u V 7 S(/3*,7*,^«)l (7n-7*) 



+ O p (a 2 n ) + O p [-). 



'o S T \ z *{u\z) 

V 

.n, 

Note that 

MSn{t) - S(t)) = v^(P„[-S T | X *(t|£)] - i>[s Tl z*m)}) 

= V^(Pn - P)[Sr\z*(t\Z)] + y/nP[(Sr\ Z .(t\Z) - S(t\Z))]. 

After using (A.l) and the results of Lemmas A. 2 and A. 3, we obtain that 
uniformly in t G [0, r], 

VTi(S n (t) - S(t)) 

v^(Pn - P)w n {Y, R, L; (3 n ,%, t) 

+ P[S Tl z*(t\Z*)VpB(P*, 1 *,Z*,t)] y K0 n -p*) 

+ P[S Tlz «(t\Z*)V 1 B(p*, 1 *,Z*,t)]^(%-7*) + o p (l), 



(A.2) 



where 



w n (y,r,l;P n ,%,t) 

= S T \ z *{t\z)-S(t) 
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S T \z*(t\Z) 



S T \z*(t\Z) 



St\z*(u-\Z) 



o S T \ z .(u\Z) 



dhi(y,r,l;/3 n ,i n ,u,Z) 



St\z*{u-\Z) 



o S T \ z *(u\Z) 



dh%(y,l;P n ,%,u,Z) 



In the third step, empirical process theory is applied to the above expres- 
sion for \/n(S n (t) — S(t)) to obtain the asymptotic properties of S n (t). We 
consider the empirical process 

^(P n -P) Wn (Y,R,L;f3* + ^, 7 * + ^,?j :t€[0,r], 

0i = O P (l),0 2 = Op(l)}, 

which is indexed by (t, 61,62)- First, we claim that uniformly in t, 

w n (Y,R,L;f3 n ,%,t) 

RI Y <tS T \ z *(t\Z*) 



S T]z *(t\Z*)-S(t) 



P(Y >y'\Z*)\ yl=Y 



rtAY 

+ S T \ z *(t\Z*) J e H T\z*^\zn+H C \z*{u\z*) dHrr ^ u \ z *^ 

in probability. This is true by using arguments similar to those in the proofs 
of Lemmas A. 2 and A. 3. Second, with technical calculation we can verify 
that each function in the class indexed by (t, 61,62) belongs to JBV[0,t] as 
a function of t and is Lipschitz continuous with respect to (61,62) with the 
Lipschitz coefficient bounded by 0(—^ — ) in probability. Thus we can check 

each condition of Theorem 2.11.23 in van der Vaart and Wellner (1996) and 
obtain that, in /°°([0,r]), 



v/n(P n - P)w n (Y, R, L; n ,%,t) 

= ^(Pn " P) 

S T \ Z *{t\Z*) - S(t) 



RIy<tST\Z*{t\Z* 



P(Y>y'\Z*)\ yl=Y 



f tAY 

+ S T \ z ,{t\Z*) J e H T\z*M z ^ +H c\z*( u \ z *U u H T]z *(u\Z*] 

+ Op(l). 

Therefore, from (A. 2) we obtain that uniformly in t G [0, r], 
VTi(S n (t) - S(t)) 
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= y^(Pn " P) 

rtAY 

+ S T \ z ,{t\Z*) J e HT \ z * {u \ z ^ +H c\ z * {u \ z *U u H T \ z ,{u\Z* 

- P[S T]z *(t\Z*)VpB(p*, 7 *,Z*,t)]VTi0 n - (3*) 

-P[S Tlz *(t\Z*)V,B(P* n *,Z*,t)]V^(%-i*)+o P (l). 
Combining with the result of Theorem 3.1, we obtain Theorem 3.2. □ 

Proof of Lemma 3.2. Obviously, the estimator S n (t) is the same as 
S n (t; $ n ,ln)- By repeating the proof of Theorem 3.2, we can obtain that if 
1/3-/3*1 + |7 - 7*1 =o(0, then 

S n (t;fi,-y)-S(t) 



(Pn-P) 



Sr\ z *(t\Z*)-S(t) 

-RI Y < t S Tlz ,(t\Z*)e H ^(y\^)+H c ^(y\z*) 

rtAY 

+ S T \ z ,{t\Z*) J e H Tlz *(u\z*)+H C[z *(u\z*) duHnz ^ u \z*) 



- P{S T{Z , (t\Z*)[B(P, 7 , Z, t) - H T{Z * (t\Z* )}} + o/ 

where we recall B(fJ, j, z, t) = J* ^T^wZQT* ' 

We especially choose 7 = 7 n and (3 = j3 n + e n v where v is any constant 
vector on 

R dim(p*) with 

norm 1. After linearizing the 7, Z,t) around 
(3 = (3*, 7 = 7*, we find that 

v,%) ~ S(t) 

= -P{S Tlz *(t\Z*)[B(j3\ 1 *,Z*,t)-H Tlz «(t\Z*)]} 

- e n V{S T \ z , (t\Z*)V p B(p* , 1 *,Z*,t)}v + O p (-^)+ 0(4). 

When one of the working models is correct, 

-P{S T]z *(t\Z*)[B(P*, 1 *,Z*,t)-H T]z ,(t\Z*)]} = 0. 

Moreover, S n (t) — S(t) = O p (-^). Therefore, 

S n (t; Pn + e n v, 7 n) - S n (t) p _ p{ g r| ^ (t\Z*)V p B{j3* , 7*, Z*, t)}v. 
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Similarly, for any constant vector v in 

R dim(r) with 

norm 1, 

>-t>{S T \ z *(t\Z )V 7 B(/3 ,7 ,Z ,t)}v. 

So the conclusions in the lemma hold. □ 

Acknowledgments. This work is part of my Ph.D. dissertation advised by 
Professor Susan Murphy at the University of Michigan. I owe many thanks 
to her for numerous discussions and helpful comments. I also thank an As- 
sociate Editor and a referee for their valuable suggestions. 



REFERENCES 

Andersen, P. K. and Gill, R. D. (1982). Cox's regression model for counting processes: 

A large sample study. Ann. Statist. 10 1100-1120. MR673646 
Bennett, S. (1983). Analysis of survival data by the proportional odds model. Statistics 

in Medicine 2 273-277. 
Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and Wellner, J. A. (1993). Efficient and 

Adaptive Estimation for Semiparametric Models. John Hopkins Univ. Press, Baltimore, 

MA. 

Cox, D. R. (1972). Regression models and life-tables (with discussion). J. Roy. Statist. 

Soc. Ser. B 34 187-220. MR341758 
Dabrowska, D. M. (1987). Nonparametric regression with censored survival time data. 

Scand. J. Statist. 14 181-197. 
Gill, R. D., van der Laan, M. J. and Robins, J. (1997). Locally efficient estimation 

in censored data models with high-dimensional covariate vectors or time-dependent 

marker processes. Unpublished manuscript. MR932943 
Little, R. J. A. (1986). Survey nonresponse adjustments for estimates of means. Internat. 

Statist. Review 54 139-157. 
Robins, J. M., Rotnitzky, A. and van der Laan, M. (2000). Comment on "On profile 

likelihood," by S. Murphy and A. W. van der Vaart. J. Amer. Statist. Assoc. 95 477-482. 
Rotnitzky, A. and Robins, J. M. (1995). Semiparametric regression estimation in the 

presence of dependent censoring. Biometrika 82 805-820. MR1380816 
Rubin, D. B. (1976). Inference and missing data (with discussion). Biometrika 63 581- 

592. MR455196 

Scharfstein, D. O. and Robins, J. M. (2002). Estimation of the failure time distribution 
in the presence of informative censoring. Biometrika 89 617-634. MR1929167 

Tibshirani, R. and Hastie, T. (1987). Local likelihood estimation. J. Amer. Statist. 
Assoc. 82 559-567. MR898359 

van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical 
Processes With Applications to Statistics. Springer, New York. MR1385671 

Department of Biostatistics 
University of North Carolina 
Chapel Hill, North Carolina 27599-7420 
USA 

E-MAIL: dzeng(9!bios.unc.edu 



